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LARGE  EDDY  SIMULATIONS  OF  SURFACE  WINDS 


LES  MODELING  OF  WINDS  OVER  WAVES 
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WAVE  AGE,  WIND-WAVE  ALIGNMENT  DEFINITIONS 


E 

=3 

k. 

u 

CU 

CL 

CO 

JC 

bp 

'<u 

JZ 

Q) 
> 
n 5 


CU 

JZ 

c 

~U 

<U 

cu 

CL 

to 

<U 

lO 

a 3 
jC 
Q. 

Q 

<U 

CL 


£ 

o 

r-H 

II 

4— » 

ro 

"ro 

3 

(O 

3 


<L> 

<D 

CL 

CO 

T3 

C 


CD 

>> 

JO 

CD 

U 

£ 

3 

CO 

<D 

U 

C 

CD 

k. 

£ 

CU 


u 

_o 

<u 

> 

c 

o 


(J 


<u 

>> 

JO 

<u 

u 

£ 

k. 

3 

CO 


T  t  t 


o 


# 


CD 

bO 

ro 

<U 

> 

ro 


# 

a 


o 

o 

♦ 


C 

o 

u 

CD 

k. 

t5 

c 

O 

ro 

bO 

ro 

CL 

O 


CU 

> 

ro 


T3 

C 

ro 

co 

T5 

C 


C 

<u 

cu 


<D 

JZ 

JD 

bO 

c 

ro 

T 


WAVE  AGE,  WIND-WAVE  ALIGNMENT  DEFINITIONS 


C 

o 


_  T? 
T3  Q 
C  ~ 
03  TO 

C/)  CD 
U  C 
C  0 

0 
«*— » 

O 

E 

0 


E 

3 


WAVE  STATE  FROM  THE  COUPLED  BOUNDARY 
LAYER  AIR-SEA  TRANSFER  EXPERIMENT 


Air-Sea  Interaction  Tower  near  Martha’s 
Vineyard  during  CBLAST 
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TURBULENT  FLOW  IN  THE  PBL  ABOVE  WAVES 

U-contours,  Wave  Age  >  2 
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FOR  VARYING  WAVE  AGE  AND  WIND-WAVE  ALIGNMENT 
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Estimation  of  ship  loads  needs  to  acknowledge  (account  for)  the 
stochastic  nature  of  the  wind,  wave,  and  current  fields  in  realistic 
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Expand  the  LES  database  of  solutions 
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WIND  VECTORS  AT  Z  =  10  METERS 


WINDS  OVER  WAVES:  EMPIRICAL  PARAMETERIZATIONS 
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WINDS  OVER  WAVES:  EMPIRICAL  PARAMETERIZATIONS 
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Technical  Objectives 

The  research  program  outlined  here  is  a  follow  on  to  the  mini-workshop  on  calculating  ship  motions  in 
geophysical  environments  held  in  early  2006  in  Washington,  DC.  The  long  term  goal  of  this  research  is  to 
further  the  present  understanding  of  turbulent  flow  over  surface  waves  and  the  coupling  with  ocean 
currents,  information  which  can  potentially  provide  guidance  for  ship  designs  and  motion  prediction  in 
extreme  sea  states.  In  the  near  term  the  research  focuses  on  examining  the  sensitivity  of  the  atmospheric 
surface  winds  (magnitude,  direction,  and  statistics)  to  varying  wind-wave  orientation  and  wave  age  (i.e., 
equilibrium  and  non-equilibrium  sea  states)  using  idealized  turbulence  resolving  large-eddy  simulations 
(LESs).  The  output  from  the  numerical  simulations,  statistics  and  visualization  of  surface-layer  winds, 
forms  a  database  which  can  then  be  used  to  identify  wind-wave  conditions  that  would  adversely  impact 
the  motion  of  ships  at  sea.  On  a  longer  timeline  algorithmic  improvements  to  our  LES  will  be  pursued  to 
allow  simulations  above  a  measured  spectrum  of  3D  surface  waves. 


Technical  Approach 

Our  technical  approach  to  the  interaction  problem  between  atmospheric  turbulence  and  surface  waves  is  a 
computational  one  that  relies  on  turbulence  resolving  LES.  It  builds  and  expands  on  work  currently 
funded  by  the  Physical  Oceanography  Section  of  ONR.  Currently,  we  are  using  LES  with  resolved 
surface  waves  to  aide  in  the  interpretation  of  observations  collected  during  the  low-wind  Coupled 
Boundary-Layers  Air-Sea  Transfer  (CBLAST)  field  campaign  (Edson  etal,  2007,  Sullivan  etal,  2007)  and 
in  the  new  ONR  High-Resolution  Air-Sea  Interaction  Departmental  Research  Initiative.  Details  can  be 
found  in  progress  reports  submitted  to  ONR  during  years  FY01  through  FY06.  Also  we  are  analyzing 
observational  data  from  a  unique  field  campaign  focused  on  improving  subgrid-scale  models  in  LES 
codes  (Sullivan  etal ,  2006). 

For  the  initial  stage  of  the  present  project  we  are  using  an  existing  LES  code  for  the  marine  planetary 
boundary  layer  (PBL)  with  the  novel  capability  to  impose  2D  moving  sinusoidal  modes  at  its  lower 
boundary.  The  computational  algorithm  uses  a  co-located  grid  and  solves  advection  equations  for 
Cartesian  (spatially  filtered)  velocity  components  u,  subgrid-scale  energy  e,  and  potential  temperature 
0  [further  details  are  given  in  Sullivan  etal  (2007)].  The  code  is  used  to  simulate  stratified  atmospheric 
turbulence  driven  by  variable  geostrophic  pressure  gradients  in  the  presence  of  monochromatic  surface 
waves  of  varying  amplitude  (or  waveslope  ak),  phase  speed  c,  and  vector  orientation  between  winds  and 
waves.  Important  parameters  used  to  classify  the  state  of  winds  and  waves  are  then  the  wave  age  c/\Ug\, 
where  Ug  is  the  geostrophic  wind  vector,  and  the  orientation  (alignment)  between  Ug  and  the  wave 
propagation  direction.  A  database  of  LES  solutions  with  systematic  variations  in  wave  age  and  wind- 


wave  alignment  can  then  be  built.  In  order  to  reduce  the  heavy  computational  expense  new  solutions  are 
generated  using  restarts  from  a  seed  solution  of  a  low-wind  strongly  dominated  swell  regime  (Sullivan 
etal ,  2007).  The  latter  was  run  to  a  quasi-stationary  (statistically  steady)  state.  LES  volumes  are  used  to 
generate  turbulence  statistics,  e.g.,  mean  wind  speed,  turbulence  variances,  and  for  flow  visualization. 

Progress  Statement  Summary 

A  large-eddy  simulation  (LES)  code  for  the  marine  atmospheric  boundary  layer  with  the  capability  to 
impose  2D  sinusoidal  moving  modes  at  its  lower  boundary  was  used  to  study  the  interaction  between  the 
atmospheric  winds  and  the  wavefield.  During  the  past  year  8  new  LES  solutions  were  generated  with 
variations  in  wave  age  and  wind-wave  alignment.  Results  from  these  simulations  show  that  atmospheric 
winds  (means  and  instantaneous  fields)  respond  in  unique  ways  depending  on  the  character  of  the  wave 
field.  When  the  wavefield  is  dominated  by  swell  upward  momentum  transport  from  the  ocean  to  the 
atmosphere  can  create  a  low-level  wind  maximum.  However  if  swell  opposes  the  wind,  the  wave  field 
acts  similar  to  stationary  roughness  slowing  the  surface  layer  winds  and  generating  high  levels  of 
turbulent  fluctuations.  The  response  of  the  surface-layer  winds  to  growing  seas  is  similar  to  flow  over 
stationary  roughness  elements  but  depends  on  wind-wave  alignment. 

Progress 

Our  previous  computational  work  (Sullivan  etal,  2007)  as  well  as  observations  (e.g.,  Donelan  etal,  1997) 
find  that  the  structure  of  the  marine  PBL  and  in  particular  the  surface  layer  winds  depend  on  the  state  of 
the  wave  field.  In  situations  where  the  wave  field  is  propagating  faster  than  the  surface  wind,  i.e.,  the 
wavefield  is  dominated  by  swell,  unique  interactions  occur  that  include:  development  of  a  low-level  wind 
maximum  and  upward  momentum  transport  from  the  wavefield  to  the  winds.  These  features  are  in 
contrast  to  situations  where  the  wave  field  is  growing  under  the  action  of  the  wind,  i.e.,  young  developing 
seas. 


Run 

Geostrophic  wind 
vector  (Ug,Vx)  (m/s) 

Wave  propagation 
direction  0  (degrees) 

Wave  age 

c/m 

i 

(5,0) 

0 

>2 

2 

(5,5) 

0 

>  2 

3 

(5,0) 

180 

>2 

4 

(5,5) 

180 

>2 

5 

(12.5,0) 

0 

~1 

6 

(12.5,12.5) 

0 

~1 

7 

(12.5,0) 

180 

=1 

8 

(12.5,12.5) 

180 

~1 

9 

(12.5,0) 

0 

=1 

Table  1 :  Wind  and  wave  properties  of  the  LES  database 

In  order  to  further  study  the  interactions  between  surface  layer  turbulence  and  waves  we  expanded  our 
existing  LES  database  by  generating  8  new  LES  solutions  during  the  past  year.  These  new  runs  vary  wave 
age  and  wind-wave  alignment  by  adjusting  the  horizontal  components  of  the  geostrophic  wind  as  shown 
in  Table  1.  In  all  runs  the  boundary  layer  is  driven  by  constant  geostrophic  winds  with  zero  surface 
heating  typical  of  a  near-neutral  marine  atmospheric  PBL.  The  3D  computational  domain  is 
(1 200x1 200x800)m  discretized  with  (250x250x96)  gridpoints.  A  surface-fitted  mesh  is  utilized  and  to 
concentrate  resolution  near  the  wave  a  stretched  vertical  grid  with  vertical  spacing  Az  =  1  m  at  the  surface 
is  employed.  For  all  simulations  the  imposed  surface  wave  has  wavelength  X  =  100m,  phase  speed  c  = 


12.5m/s,  and  waveslope  ak  =  0.1  where  the  wavenumber  k  =  2n/X  and  a  is  the  wave  amplitude.  Run  9 
examines  the  sensitivity  to  wave  amplitude  as  it  has  steeper  waves  with  ak  =  0. 1 5.  Note  that  in  Table  1  for 
runs  [1,2,5, 6, 9]  the  waves  are  propagating  with  the  w-component  of  the  surface  wind  (i.e.,  the  wave 
propagation  direction  0  =  0  degrees)  while  in  runs  [3,4,7, 8]  the  waves  are  opposing  the  ^-component  of 
the  surface  wind  (i.e.,  0  =  180  degrees).  Statistics  are  obtained  by  spatial  averaging  in  horizontal  planes 
and  by  time  averaging.  Further  computational  details  are  provided  in  Sullivan  etal( 2007). 

Extensive  flow  visualization  and  animations  of  the  LES  solutions  are  used  to  investigate  interactions 
between  marine  PBL  winds  and  the  imposed  surface  wave.  An  illustration  of  the  impact  of  wave  age  and 
wind-wave  alignment  on  the  instantaneous  horizontal  wind  field  uh  is  shown  in  Figure  1.  It  is  readily 
apparent  that  the  structure  of  the  PBL  wind  fields  and  in  particular  the  surface  winds  are  dependent  on 
wave  state.  In  the  case  dominated  by  swell  (wave  age  >  2)  the  wave  signature  is  strongly  impressed  on  the 
winds  and  pockets  of  super-geostrophic  wind  speed  are  observed  in  the  wave  troughs;  a  signature  of 
wave-driven  winds.  Meanwhile  for  growing  seas  (wave  age  ~  1),  elongated  streaks  appear  and  are  the 
dominant  coherent  structure  in  the  PBL  surface  layer;  in  this  particular  simulation  the  surface  low-speed 
streaks  are  rotated  in  the  computational  domain  mimicing  the  orientation  of  the  geostrophic  wind  vector 
(see  Table  1).  Similar  streaks  are  observed  in  neutrally  stratified  LES  over  stationary  roughness  (Moeng 
&  Sullivan,  1994)  which  suggest  that  flow  over  growing  seas  has  some  similarity  to  flow  over  roughness. 

Figures  2  and  3  quantify  the  impact  of  wind-wave  alignment  and  wave  age  on  the  surface  layer  mean 
wind  and  turbulent  fluctuations.  In  these  figures,  the  horizontal  winds  (uh)  are  interpolated  to  a  standard 
10  m  height  above  the  surface.  Wind-wave  alignment  is  observed  to  have  an  important  impact  on  the 
magnitude  of  the  mean  surface  wind,  see  Figure  2.  For  wave  age  >  2  with  winds  and  waves  propagating 
in  the  same  direction  notice  the  formation  of  a  low-level  wind  maximum  <«*)/]  Ug\  >  1  and  a  slight 
rightward  rotation  of  the  wind  vector  compared  to  the  geostrophic  wind  vector.  These  effects  are  a 
consequence  of  upward  momentum  transfer,  i.e.,  in  the  opposite  sense  from  the  waves  to  the  winds  as 
discussed  by  Sullivan  etal  (2007).  However  if  the  wave  propagation  direction  is  reversed  so  that  the 
winds  and  waves  are  opposed  then  the  underlying  wave  field  acts  similar  to  stationary  roughness;  the 
waves  induce  negative  vertical  momentum  transfer  (aV  <  0)  which  slows  the  surface  winds.  For 
geostrophic  winds  aligned  at  an  angle  of  45  degrees  to  the  surface  waves  the  mean  wind  speeds  are 
observed  to  be  bound  by  the  two  extremes  of  aligned  and  opposing  winds  and  waves.  For  run  2,  the 
surface  winds  are  distinctly  rotated  to  the  right  of  the  geostrophic  wind  vector  an  indication  of  upward 
momentum  transfer  from  waves  to  wind.  The  above  trends  depend  strongly  on  the  state  of  wave 
development,  i.e.,  wave  age.  For  growing  seas  (wave  age  =  1)  the  surface  waves  always  act  as  drag 
elements  similar  to  stationary  roughness  irrespective  of  the  wind-wave  alignment.  The  surface  winds  are 
always  slower  than  the  geostrophic  wind,  but  the  magnitude  of  the  mean  wind  is  dependent  on  the  wind- 
wave  alignment.  An  increase  in  waveslope  (run  9)  does  not  appear  to  alter  these  trends. 

It  is  expected  that  wind  gusts  (turbulence)  also  play  an  important  role  in  estimating  ship  motions.  Figure  3 
shows  how  the  horizontal  turbulence  intensity  varies  with  wave  age  and  wind-wave  alignment.  Notice  the 
magnitude  of  the  wind  fluctuations  exhibits  an  opposite  trend  compared  to  the  mean  wind;  the  horizontal 
turbulent  fluctuations  are  largest  (smallest)  in  the  case  of  opposing  (following)  winds  and  waves.  In 
particular  the  largest  turbulent  fluctuations,  as  a  fraction  of  the  mean  wind,  occur  for  wave  age  >  2  with 
opposing  winds  and  waves. 

In  the  open  ocean  winds  and  waves  are  often  in  dis-equilibrium;  the  wave  field  can  be  moving  faster  or 
slower  and  at  angles  to  the  surface  winds.  The  present  LES  results,  statistics  and  instantaneous  flowfields, 
illustrate  that  the  surface  layer  winds  (means  and  fluctuations)  respond  in  unique  ways  depending  on  the 
character  of  the  underlying  wave  field. 


References 


Edson,  J.,  T.  etal,  2007:  The  coupled  boundary  layers  and  air-sea  transfer  experiment  in  low  winds 
(CBLAST-Low),  Bulletin  of  the  American  Meteorological  Society,  88,  342-356. 

Donelan,  M.  A.,  W.  M.  Drennan,  &  K.  B.  Katsaros,  1997:  The  air-sea  momentum  flux  in  conditions  of 
wind  sea  and  swell.  Journal  of  Physical  Oceanography,  27,  2087-2099. 

Moeng,  C-H.  &  P.  P.  Sullivan,  1994:  A  comparison  of  shear  and  buoyancy  driven  planetary-boundary- 
layer  flows.  Journal  of  the  Atmospheric  Sciences,  51,  999-1022. 

Sullivan,  P.  P.,  J.  B.  Edson,  T.  Hristov,  &  J.  C.  McWilliams,  2007:  Large  eddy  simulations  and 
observations  of  atmospheric  marine  boundary  layers  above  non-equilibrium  surface  waves.  Journal  of  the 
Atmospheric  Sciences ,  submitted. 

Sullivan,  P.  P„  J.  B.  Edson,  T.  W.  Horst,  J.  C.  Wyngaard,  &  M.  Kelly,  2006:  Subfilter  scale  fluxes  in  the 
marine  surface  layer:  Results  from  the  Ocean  Horizontal  Array  Turbulence  Study  (OHATS).  17,h 
Conference  on  Boundary  Layer  and  Turbulence,  San  Diego,  CA. 


Figure  1:  Visualization  of  instantaneous  horizontal  wind  fields  uh  in  the  presence  of  a  moving 
surface  wave.  Upper  panel  is  simulation  run  1  with  wave  age  >  2  for  winds  following  waves,  while 
the  lower  panel  is  simulation  run  6  with  wave  age  =  1.  In  each  figure  the  horizontal  plane  is  at  z  = 
10  m.  The  winds  are  normalized  by  the  magnitude  of  the  geostrophic  wind  and  the  color  bar  is  in 
units  of  uh/\Ug\.  For  visualization  purposes  the  images  are  stretched  in  the  vertical  direction  and 
the  white  mesh  lines  denote  the  wave  surface.  The  horizontal  extent  of  the  domain  is  1200  m  in  the 
x  and  y  directions  and  135  m  in  the  z  direction.  Note  the  super-geostrophic  winds  in  the  upper 
panel  and  the  strong  signature  of  the  underlying  wave  in  the  wind  field. 


Wave  age  >  2 


Wave  age  » 1 


90  90 


Figure  2:  Effect  of  wave  age  and  wind-wave  alignment  on  the  mean  horizontal  wind  («/,)  at  z  = 
10m.  The  surface  winds  are  normalized  by  the  geostrophic  wind  magnitude  \U„\.  Colored  and 
black  lines  indicate  the  magnitude  and  orientation  of  the  surface  and  geostrophic  winds, 
respectively,  and  the  direction  of  wave  propagation  is  shown  in  the  legend.  Results  for  situations 
dominated  by  swell  (wave  age  >  2)  and  growing  seas  (wave  age  ~  1)  are  presented  in  the  left  and 
right  panels,  respectively. 


Wave  age  >  2 
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Figure  3:  Effect  of  wave  age  and  wind-wave  alignment  on  the  magnitude  of  the  horizontal  wind 
turbulence  [root-mean-square  (rms)  values]  at  z  =  10m  for  the  same  cases  as  in  Figure  2.  The  rms 
wind  fluctuations  «*'  are  normalized  by  the  geostrophic  wind  magnitude  \Ut\.  The  legend  shows 
the  direction  of  wave  propagation. 
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11B.5  A  HIGHLY  PARALLEL  ALGORITHM  FOR  TURBULENCE  SIMULATIONS  IN 
PLANETARY  BOUNDARY  LAYERS:  RESULTS  WITH  MESHES  UP  TO  10243 


Peter  P.  Sullivan*  and  Edward  G.  Patton 

National  Center  for  Atmospheric  Research,  Boulder,  CO 


1.  INTRODUCTION 

Petascale  computing  (eg.,  UCAR/JOSS,  2005)  has 
the  potential  to  alter  the  landscape  of  turbulence  simu¬ 
lations  in  planetary  boundary  layers  (PBLs).  Increased 
computer  power  using  0(  1 04  -  1 05 )  or  more  processors 
will  permit  large-eddy  simulations  (LESs)  of  turbulent 
flows  over  a  wide  range  of  scales  in  realistic  outdoor 
environments,  for  example,  flow  over  hills,  atmosphere- 
land  interactions  (Patton  et  al.,  2005),  boundary  layers 
with  surface  water  wave  effects  (Sullivan  et  al.,  2008, 
2007),  and  weakly  stable  nocturnal  flows  (Beare  et  al., 
2006)  to  mention  just  a  few.  However,  computational  al¬ 
gorithms  need  to  evolve  in  order  to  utilize  the  large  num¬ 
ber  of  processors  available  in  the  next  generation  of  ma¬ 
chines.  Here  we  briefly  describe  some  of  our  recent  de¬ 
velopments  focused  on  constructing  a  massively  parallel 
large-eddy  simulation  (LES)  code  for  simulating  incom¬ 
pressible  Boussinesq  atmospheric  and  oceanic  boundary 
layers.  The  performance  of  the  code  is  evaluated  on  vary¬ 
ing  meshes  utilizing  as  many  as  16,384  processors.  As  an 
application,  the  code  is  used  to  examine  the  convergence 
of  LES  solutions  for  a  daytime  convective  PBL  on  grids 
varying  from  323  to  10243. 

2.  2-D  DOMAIN  DECOMPOSITION 

Typical  LES  model  equations  for  dry  Boussinesq 
boundary  layers  include  at  a  minimum:  a)  transport 
equations  for  momentum  pu;  b)  a  transport  equation  for  a 
conserved  buoyancy  variable  ( e.g .,  virtual  potential  tem¬ 
perature  0V);  c)  a  discrete  Poisson  equation  for  a  pres¬ 
sure  variable  k  to  enforce  incompressibility;  and  clo¬ 
sure  expressions  for  subgrid-scale  (SGS)  variables,  e.g., 
a  subgrid-scale  equation  for  turbulent  kinetic  energy  e. 
In  our  LES  code  these  equations  are  integrated  forward 
in  time  using  a  fractional  step  method.  The  spatial  dis¬ 
cretization  is  second-order  finite  difference  in  the  ver¬ 
tical  direction  and  pseudospectral  in  horizontal  planes 
(Moeng,  1984).  Dynamic  time  stepping  utilizing  third- 
order  Runge-Kutta  with  a  fixed  Courant-Fredrichs-Lewy 
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(CFL)  number  (Sullivan  et  al.,  1996;  Spalart  et  al.,  1991) 
is  employed.  Evaluating  horizontal  derivatives  with  Fast 
Fourier  transforms  (FFTs)  and  solving  the  elliptic  pres¬ 
sure  equation  are  non-local  operations  which  impact  the 
parallelization. 

Our  previous  code  parallelizes  the  flow  model  de¬ 
scribed  above  using  a  single  domain  decomposition  pro¬ 
cedure  that  combines  distributed  memory  MPI  tasks 
(Aoyama  and  Nakano,  1999)  and  shared  memory  OMP 
threads  (Chandra  et  al.,  2001).  The  full  computational 
domain  is  naturally  first  decomposed  in  the  vertical  z  di¬ 
rection  using  MPI,  i.e.,  a  subset  of  vertical  levels  and 
full  horizontal  x—y  domains  are  assigned  to  each  com¬ 
putational  node.  Work  on  a  node  is  then  further  parti¬ 
tioned  amongst  local  threads  using  OMP  directives.  This 
scheme  has  some  advantages;  1)  it  does  not  split  FFTs 
across  spatial  directions  since  threads  share  the  same 
memory  and  thus  a  specialized  parallel  FFT  package  is 
not  required;  and  2)  it  can  utilize  the  architecture  of  ma¬ 
chines  with  large  numbers  of  processors  per  computa¬ 
tional  node  (e.g.,  the  IBM  SP5  with  16  processors/node). 
However  the  scheme  is  limited  on  computing  platforms 
which  have  few  processors/node  (e.g.,  the  Cray  XT4  with 
2  processors/node),  and  moreover  we  find  the  OMP  di¬ 
rectives  require  continual  maintenance  that  adds  over¬ 
head  and  complexity. 

To  streamline  the  code  and  increase  its  flexibility  a 
new  parallel  algorithm  is  designed  based  on  the  fol¬ 
lowing  criteria:  1)  accomplish  2-D  domain  decomposi¬ 
tion  using  solely  MPI  parallelization;  2)  preserve  pseu¬ 
dospectral  (FFT)  differencing  in  jc  —  y  planes;  and  3) 
maintain  a  Boussinesq  incompressible  flow  model.  The 
ability  to  use  2-D  domain  decomposition  has  been  shown 
to  be  a  significant  advantage  in  pseudospectral  simu¬ 
lation  codes  as  it  allows  direct  numerical  simulations 
of  isotropic  turbulence  on  meshes  of  20483  or  more 
(Pekurovsky  et  al.,  2006).  A  sketch  of  the  domain  de¬ 
composition  layouts  that  adhere  to  our  constraints  is 
given  in  figure  1.  We  mention  2-D  domain  decomposi¬ 
tion  in  x-y  planes  is  compatible  with  the  use  of  low- 
order  finite  difference  schemes  (Raasch  and  Schroter, 
2001)  and  mesoscale  codes  that  adopt  compressible 
equations  (Michalakes  et  al.,  2005). 

In  our  2-D  domain  decomposition,  each  processor  op- 


Figure  1:  2-D  domain  decomposition  on  9  processors:  (a) 
base  state  with  y-z  decomposition;  (b)  x  -  z  decomposi¬ 
tion  used  for  computation  of  y  derivatives  and  2-D  planar 
FFT;  and  (c)  x-y  decomposition  used  in  the  tridiagonal 
matrix  inversion  of  the  pressure  Poisson  equation. 


erates  on  constricted  three-dimensional  “bricks  or  pen¬ 
cils”  sub-sampled  in  x,  y  or  z  directions.  Brick-to-brick 
communication  is  a  combination  of  transposes  and  ghost 
point  exchange.  To  preserve  pseudospectral  differenc¬ 
ing  in  the  horizontal  directions  a  custom  MPI  matrix 
transpose  was  designed  and  implemented.  Note  other 
non-local  schemes,  e.g.,  compact  finite  difference  (Lele, 
1992)  or  fully  spectral  direct  numerical  simulation  codes 
(Weme  and  Fritts,  1999),  require  similar  communication 
patterns.  Our  transpose  routines  perform  the  forward  and 
inverse  operations 


shown  in  figure  la  and  lb.  In  (1)  and  following  equa¬ 
tions,  subscripts  (  )*,<?  denote  starting  and  ending  loca¬ 
tions  in  the  (x,y,z)  directions.  The  data  transpose  shown 
schematically  in  figure  la  and  lb  only  requires  local 
communication,  i.e.,  communication  between  proces¬ 
sors  in  groups  [0, 1,2],  [3,4.5],  and  [6,7,8].  Derivatives 
9//9y,  which  are  needed  in  physical  space,  are  computed 
in  a  straightforward  fashion  using  the  sequence  of  steps: 

1.  forward  x  to  y  transpose  /  — > >  fT\ 

2.  FFT  derivative  9/r/9y;  and  , 


all  x 

ys<  y  <ye 

zs  <  z  <ze 
fT(y,x,z) 


all  y 

Xs<  X  <xe 
Zs  <  z  <Ze 


(1) 


on  a  field  /  using  a  subset  of  horizontal  processors  as 


3.  inverse  y  to  x  transpose  9 fT /dy  —>9/ /9 y. 

Existing  serial  1-D  FFT  routines  for  real  and  complex  ar¬ 
rays  are  used  as  in  previous  implementations.  Note  with 
this  algorithm  so-called  ghost  points  used  in  computing 
derivatives  9//9z  are  only  needed  on  the  top  and  bottom 
faces  of  each  brick  in  figure  la. 

The  2-D  brick  decomposition  of  the  computational  do¬ 
main  also  impacts  the  pressure  Poisson  equation  solver. 


In  an  incompressible  Boussinesq  fluid  model  the  pres¬ 
sure  k  is  a  solution  of  the  elliptic  equation 

V2  7i  =  r,  (2) 


where  the  source  term  r  is  the  numerical  (discrete)  diver¬ 
gence  of  the  unsteady  momentum  equations  ( e.g .,  Sulli¬ 
van  et  al.,  1996).  The  solution  for  n  begins  with  a  stan¬ 
dard  forward  2-D  Fourier  transform  of  (2): 


d2ft 


r(kyikx,z)  with 


all  ky 

kxs  <  kx  5:  kxe  , 
zs  <  z  <ze 


(3) 


where  {kx,ky)  are  horizontal  wavenumbers.  At  this  stage 
the  data  layout  on  each  processor  is  as  shown  in  figure 
lb.  Next,  custom  routines  carry  out  forward  ky  to  z  and 
inverse  z  to  ky  matrix  transposes  on  the  source  term  of 
(3): 


r{ky,kx,z) 


all  ky 

kXs  —  kx  ^  kxe 

zs  <  z  <ze 


rr(z,kx,ky) 


all  z 

kxs  <  kx  <  kxe 

kys  ^  ky  ^  kyg 


(4) 


Again  notice  the  communication  pattern  needed  to  trans¬ 
pose  from  figure  lb  to  lc  is  accomplished  locally  by 
processors  in  groups  [0,3,6],  [1,4,7],  and  [2,5,8].  The 
continuous  storage  of  rT  along  the  z  direction  allows 
straightforward  tridiagonal  matrix  inversion  for  pairs  of 
horizontal  wavenumbers  on  each  processor.  This  step  is 
repeated  for  all  pairs  of  horizontal  wavenumbers  and  pro¬ 
vides  the  transposed  field  7tr(z,^xs  :kxe,kys  :kye).  To  re¬ 
cover  the  pressure  field  in  physical  space  we  retrace  our 
steps:  nT  — ►  ft  followed  by  an  inverse  2-D  Fourier  trans¬ 
form  it  — >71.  In  designing  the  present  algorithm,  we  also 
considered  using  the  parallel  tridiagonal  solver  described 
by  Gibbs  (2004)  for  the  solution  of  the  Poisson  equation 
but  found  it  not  well  suited  for  the  present  scheme. 

With  these  enhancements  our  new  algorithm  allows 
very  large  number  of  processors  O(104)  to  be  utilized. 
An  important  feature  of  the  algorithm  is  that  no  global 
MPI  ALLTOALL  communication  between  processors 
is  required.  Instead,  the  MPI  routine  SENDRECV  is 
wrapped  with  FORTRAN  statements  to  accomplish  the 
desired  communication  pattern.  The  scheme  outlined 
above  introduces  more  communication  but  the  messages 
are  smaller  and  hence  large  numbers  of  gridpoints  can  be 
used.  Also,  the  total  number  of  processors  is  not  limited 
by  the  number  of  vertical  gridpoints.  For  example,  this 


Figure  2:  Computational  time  per  gridpoint  for  different 
combinations  of  problem  size  and  2D  domain  decompo¬ 
sition  for  the  Cray  XT4  (an  example  of  strong  scaling),  a) 
green  lines  and  symbols  problem  size  5123;  b)  red  lines 
and  symbols  10243;  c)  black  lines  and  symbols  20483; 
and  d)  blue  symbol  30723.  For  a  given  number  of  to¬ 
tal  processors  NP  the  symbols  are  varying  vertical  and 
horizontal  decompositions,  i.e.,  different  combinations 

(Mi, in¬ 


flexibility  allows  simulations  in  boxes  with  large  hori¬ 
zontal  and  small  vertical  extents.  The  transpose  routines 
are  general  and  allow  arbitrary  numbers  of  mesh  points, 
although  the  best  performance  is  of  course  realized  when 
the  load  is  balanced  across  processors.  Single  files,  simi¬ 
lar  to  FORTRAN  direct  access  files,  are  written  and  read 
using  MPI  I/O  (Gropp  et  al.,  1998).  We  find  MPI  I/O 
makes  the  code  robust  across  different  machine  archi¬ 
tectures  and  simplifies  the  logic  required  for  restarts,  es¬ 
pecially  if  the  number  of  processors  changes  during  the 
course  of  a  simulation.  Finally,  the  code  is  compliant 
with  the  FORTRAN-90  programming  standard. 

The  performance  of  the  code  for  varying  workload  as 
a  function  of  the  total  number  of  processors  NP  is  pro¬ 
vided  in  figures  2  and  3  for  3  different  machine  archi¬ 
tectures.  NP  =  NPZ  x  NPxy  where  NPZ  and  NPxy  are  the 
number  of  processors  in  the  vertical  and  horizontal  direc¬ 
tions,  respectively.  In  each  figure,  the  vertical  axis  is  total 
computational  time  t  x  NP  divided  by  total  work.  Nz  is 
the  number  of  vertical  levels  and  Mxy  is  proportional  to 
the  FFT  work,  i.e.,  Mxy  =  Nxy\ogNxy  with  Nxy  the  num¬ 
ber  of  gridpoints  in  the  x  andy  directions.  Ideal  scaling 
corresponds  to  a  flat  line  with  increasing  number  of  pro¬ 
cessors.  The  timing  tests  illustrate  the  present  scheme  ex¬ 
hibits  both  strong  scaling  (problem  size  is  held  fixed  and 
the  number  of  processors  is  increased)  and  weak  scal¬ 
ing  (the  problem  size  grows  as  the  number  of  processors 
increases  so  the  amount  of  work  per  processor  is  held 
constant)  over  a  wide  range  of  problem  sizes  and  is  able 
to  use  as  many  as  16,384  processors,  i.e.,  the  maximum 
number  available  to  our  application  on  the  Cray  XT4. 
Further,  the  results  are  robust  for  varying  combinations 
of  (NPr.NPxy).  Generally,  the  performance  only  begins 


Figure  3:  Computational  time  per  gridpoint  for  a  fixed 
amount  of  work  per  processor  (an  example  of  weak  scal¬ 
ing).  Red,  green,  and  blue  lines  60,000  points/processor 
for  different  machines.  Cray  XT4  red  line;  Dual  core 
IBM  SP5+  green  line;  Single  core  IBM  SP5  blue 
line.  Black  lines  and  symbols  524,288  points/processor 
for  Cray  XT4.  For  a  fixed  number  of  total  proces¬ 
sors  NP  multiple  symbols  are  different  combinations  of 
(NP^NPxy). 


Table  1 :  Simulation  properties 


Run 

Gridpoints 

Zi  (m) 

z,/Az 

w*  (ms  1 ) 

A 

32  J 

1120 

17.5 

2.06 

B 

643 

1116 

34.9 

2.06 

C 

1283 

1123 

70.2 

2.06 

D 

2563 

1095 

137.0 

2.05 

E 

5123 

1088 

272.0 

2.04 

F 

10243 

1066 

536.7 

2.04 

to  degrade  when  the  number  of  processors  exceeds  about 
8  times  the  smallest  dimension  in  the  problem  owing  to 
increases  in  communication  overhead. 

3.  GRID  SENSITIVITY 

Parallel  codes  allow  one  to  simulate  PBLs  with  a  wider 
range  of  scale  interactions  and  external  forcings,  e.g., 
Jonker  et  al.  (1999)  and  Sullivan  et  al.  (2007).  Here, 
we  briefly  explore  one  aspect  of  this  much  larger  issue, 
viz.,  the  sensitivity  and  convergence  of  LES  solutions 
as  the  grid  mesh  is  varied.  Checking  numerical  conver¬ 
gence  of  LES  solutions  is  not  readily  addressed  in  usual 
LES  practice  since  the  computational  demands  needed 
to  carry  out  the  required  grid  studies  become  prohibitive 
for  a  3-D  time  dependent  turbulent  flow  (e.g.,  see  LES 
intercomparison  studies  by  Beare  et  al.  (2006),  Brether- 
ton  et  al.  (1999)  and,  Nieuwstadt  et  al.  (1993)).  A  series 
of  LES  on  a  fixed  computational  domain  with  grid  reso¬ 
lutions  varying  from  323  to  10243  are  performed  to  ex¬ 
amine  the  solution  convergence  and  flow  structures  us¬ 


ing  the  parallel  algorithm  described  in  Section  2.  For 
each  grid  resolution,  the  mesh  spacing  is  constant  in  the 
three  (x,y,z)  coordinate  directions.  A  canonical  daytime 
convective  PBL  is  simulated  in  a  computational  domain 
(XLJL,ZL)  =  (5120,5120,2048)  m.  The  PBL  is  driven 
by  a  constant  surface  heat  flux  Q+  -  0.24  Kms-1  and 
weak  geostrophic  winds  (Ug,Vg)  =  (1,0)  ms-1.  Other 
external  inputs  are  surface  roughness  z0  =  0.1m,  Corio¬ 
lis  parameter /=  1  x  104  s-1,  and  initial  inversion  height 
zi  ~  1000  m.  The  PBL  is  dominated  by  convection  since 
the  Monin-Obukhov  length  scale  L  <  - 1.5  m  and  thus 
the  metric  z,/Z,  =  O(-500)  (Moeng  and  Sullivan,  1994). 
All  simulations  are  started  from  small  random  seed  per¬ 
turbations  in  potential  temperature  near  the  surface.  The 
simulations  are  carried  forward  for  more  than  25  large 
eddy  turnover  times  T  =  z//w*.  The  convective  velocity 
scale  w*  =  (gQ*Zi/ O^)1/3  with  g  gravity  and  0O  a  refer¬ 
ence  potential  temperature.  See  Moeng  (1984),  Moeng 
and  Wyngaard  (1989)  and  Sullivan  et  al.  (1998)  for  a  fur¬ 
ther  description  of  the  simulation  design.  Bulk  properties 
of  the  simulations  are  given  in  Table  1 . 

4.  PRELIMINARY  RESULTS 
4.1  Flow  visualization 

Fine  mesh  simulations  allow  a  wider  range  of  large 
and  small  scale  structures  to  co-exist  and  thus  interact  in 
a  turbulent  flow.  Flow  visualization  in  figures  4  and  5  il¬ 
lustrates  the  formation  of  both  large  and  small  structures. 
In  figure  4,  we  observe  the  classic  formation  of  plumes 
in  a  convective  PBL.  Vigorous  thermal  plumes  near  the 
top  of  the  PBL  can  trace  their  roots  through  the  middle 
of  the  PBL  down  to  the  surface  layer.  Convergence  at 
the  common  comers  of  the  hexagonal  patterns  in  the  sur¬ 
face  layer  leads  to  the  formation  of  strong  updrafts  which 
evolve  into  large  scale  plumes  that  fill  and  dominate  the 
dynamics  of  the  daytime  PBL.  Near  the  inversion  a  de¬ 
scending  shell  of  motion  readily  develops  around  each 
plume. 

Closer  inspection  of  the  large  scale  flow  patterns  in  fig¬ 
ure  4  also  reveals  coherent  smaller  scale  structures.  This 
is  demonstrated  in  figure  5  where  we  track  the  evolu¬ 
tion  of  105  particles  over  about  1000  seconds.  Over  the 
limited  region  where  the  particles  are  released  the  flow 
is  dominated  by  a  persistent  line  of  larger  scale  upward 
convection.  On  either  side  of  the  convection  line  de¬ 
scending  motion  develops.  Near  the  surface  these  down- 
drafts  turn  laterally  and  converge.  The  outcome  of  this 
surface  layer  convergence  spawns  many  small  scale  ver¬ 
tically  oriented  vortices,  i.e.,  dust  devils.  The  rapidly 
rotating  vortices  are  readily  observed,  persist  in  time, 
and  rotate  in  both  clockwise  and  counterclockwise  di¬ 
rections.  Often  the  vortices  coalesce  in  a  region  where  a 
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Figure  4:  Visualization  of  the  vertical  velocity 
field  in  a  convective  PBL  at  different  heights 
from  a  5 1 23  simulation.  Plumes  near  the  inver¬ 
sion  can  trace  their  origin  to  the  hexagon  pat¬ 
terns  in  the  surface  layer.  The  color  bar  is  in 
units  of  ms-1. 
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coherent  thermal  plume  erupts.  Coarse  mesh  LES  hints  wmax: 

at  these  coherent  vortices  but  fine  resolution  simulations 

allow  a  detailed  examination  of  their  dynamics  within  a  £ 

larger  scale  flow.  Previously,  Kanak  (2005)  has  observed 

the  formation  of  dust  devils  in  convective  simulations, 

but  in  small  computational  domains  0(750)  m.  E 
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4.2  Statistics 

The  impact  of  mesh  resolution  on  typical  (normalized) 
turbulence  statistics,  viz.,  SGS  dissipation  e,  total  tur¬ 
bulent  kinetic  energy  £,  and  maximum  vertical  velocity 


is  shown  in  figure  6.  In  (5),  the  resolved  scale  veloc¬ 
ity  components  are  w,  =  (w,v,vv),  the  subgrid-scale  en¬ 
ergy  e  =  (uiUj  -  UiTTi)/ 2,  the  LES  filter  width  is  A/,  and 
Ce  ~  0.93  is  a  modeling  constant  (Moeng  and  Wyngaard, 
1988).  A  premise  of  LES,  and  also  the  basis  of  most  SGS 
modeling,  states  that  the  average  dissipation  is  constant 


Figure  5:  Visualization  of  particles  released  in  a  convective  PBL  at  z/z,-  ~  0.2  over  a  limited  horizontal  extent  from  a 
10243  simulation  of  convection.  The  viewed  area  is  ~  3.8%  of  the  total  horizontal  domain.  Notice  the  evolution  of  the 
larger  scale  line  of  convection  into  small  scale  vortical  dust  devils.  Time  advances  from  left  to  right  beginning  along 
the  top  row  of  images. 


if  the  filter  width  lies  in  the  inertial  subrange  range;  then 
e  ~  A^3  (Moeng  and  Wyngaard,  1988).  Similarly  the 
total  turbulent  kinetic  energy,  i.e.,  the  sum  of  resolved 
and  SGS  pieces,  also  tends  to  a  constant.  Figure  6  is  a 
test  of  this  hypothesis.  Notice  the  LES  solutions  con¬ 
verge  over  the  bulk  of  the  PBL  when  the  mesh  is  2563 
or  greater.  In  other  words,  these  low-order  LES  statis¬ 
tics,  for  this  particular  convective  PBL,  become  indepen¬ 
dent  of  the  grid  resolution  only  when  z,/A y  >  130.  This 
is  typically  finer  resolution  than  is  used  in  routine  cal¬ 
culations  of  free  convection.  It  is  encouraging  to  see 
only  small  changes  when  the  resolution  is  increased  from 
5 123  to  10243.  The  above  results  hint  that  the  SGS  model 
impacts  the  coarse  solutions  in  important  ways,  espe¬ 
cially  when  the  filter  width  approaches  the  energy  con¬ 
taining  scales  (Sullivan  et  al.,  2003). 

Moeng  and  Rotunno  (1990)  identify  the  vertical  ve¬ 
locity  skewness  Sw  as  a  critical  parameter  in  boundary 
layer  dynamics.  In  convective  PBLs,  Sw  is  an  indica¬ 
tor  of  the  updraft-downdraft  distribution,  provides  clues 
about  vertical  transport,  and  is  often  utilized  in  disper¬ 
sion  studies  (Weil,  1988,  1990).  Further,  Moeng  and  Ro¬ 
tunno  (1990)  find  vertical  velocity  skewness  is  sensitive 
to  the  structure  of  the  boundaries,  i.e.,  it  depends  on  the 
type  of  surface  boundary  conditions,  and  also  varies  with 
Reynolds  number  in  direct  numerical  simulations.  Hunt 
et  al.  (1988)  provides  a  brief  interpretation  of  the  skew¬ 
ness  variation  predicted  by  LES  in  the  surface  layer  of  a 
convective  boundary  layer. 


The  definition  of  the  vertical  velocity  skewness  is 


(w2)3/2 


(6) 


where  ( )  denotes  an  ensemble  average  and  w  is  the  total 
velocity.  In  order  to  examine  the  impact  of  grid  resolu¬ 
tion  on  Sw  we  analyze  the  solutions  from  the  different 
simulations  in  Table  1  with  the  caveat  that  we  use  the  re¬ 
solved  or  filtered  vertical  velocity  vv.  Hence  we  compute 
the  resolved  skewness 


c  _  (www) 

"  (vvvv)3/2  ’ 


(7) 


from  our  LES  solutions.  Recall  since  typical  LES  uses 
Smagorinsky  style  closures  with  subgrid-scale  fluxes  pa¬ 
rameterized  at  the  second  moment  level  subgrid-scale 
triple  moments  are  unknown  and  thus  there  is  not  a  clear 
definition  of  “subgrid-scale  skewness”  in  an  LES. 

Vertical  profiles  of  SW  are  shown  in  figure  7.  These 
profiles  exhibit  a  clear  and  striking  dependence  on  grid 
resolution;  near  the  surface,  z/z,  <0.15,  Sw  decreases 
and  eventually  becomes  (unrealistically)  negative  as  the 
grid  resolution  decreases.  Meanwhile  as  z/z/  — >  1  an  op¬ 
posite  trend  is  observed.  With  decreasing  grid  resolution 
Sw  becomes  more  positive  and  shows  a  pronounced  max¬ 
imum  below  the  inversion.  Away  from  the  lower  bound¬ 
ary,  0.05  <  z/zi  <  1,  the  skewness  estimates  appear  to 
converge  when  the  mesh  is  fine,  2563  or  greater.  No¬ 
tice  the  impact  of  grid  resolution  in  the  surface  layer. 


Figure  6:  Effect  of  mesh  resolution  z//Az  on  bulk  bound¬ 
ary  layer  turbulence,  a)  dissipation;  b)  total  TKE;  and 
c)  the  average  resolved  maximum  vertical  velocity.  In 
a)  and  b)  results  for  different  vertical  locations  z/z,-  = 
(0.1, 0.5. 0.9)  are  indicated  by  (red, black, blue)  curves, 
respectively. 


Figure  7:  Effect  of  mesh  resolution  on  vertical  velocity 
skewness  5W-  The  legend  indicates  the  resolution  of  the 
various  simulations.  Note  the  skewness  is  computed  us¬ 
ing  the  resolved  (or  filtered)  vertical  velocity  field  w.  Ob¬ 
servations  are  taken  from  the  results  provided  in  Moeng 
and  Rotunno  (1990). 


As  z//Az  increases  the  skewness  estimates,  especially 
with  meshes  51 23  and  10243,  are  in  good  agreement  with 
the  few  available  observations.  Above  z/z,  >  0.75,  we 
have  no  compelling  explanation  for  the  differences  be¬ 
tween  the  fine  mesh  LES  predictions  and  the  few  obser¬ 
vations,  but  note  that  the  presence  of  wind  shear  reduces 
the  skewness  (Fedorovich  et  al.,  2001).  There  is  an  ob¬ 
vious  need  for  more  observations  to  determine  whether 
this  discrepancy  is  due  to  limited  sampling  in  the  obser¬ 
vations  or  is  a  shortcoming  of  the  LES. 

The  grid  dependence  in  figure  7  invites  further  ex¬ 
ploration.  Some  speculative  explanations  are:  (1)  grid 
resolution  alters  the  structure  of  the  overlying  inversion. 
Coarser  grids  can  only  support  weaker  inversions  com¬ 
pared  to  fine  grids  and  perhaps  depends  on  inversion 
strength;  or  (2)  perhaps  the  small  scale  high  frequency 
content  of  (w3)  changes  sign  below  the  inversion  and 
thereby  reduces  the  magnitude  of  as  the  grid  is  re¬ 
fined. 

Our  current  interpretation  of  the  results  in  figure  7 
hinges  on  the  behavior  and  modeling  of  the  subgrid-scale 
fluxes  in  LES.  In  order  to  expose  this  dependence  we  first 
introduce  the  definitions  of  the  third  and  second  order 
SGS  moments 


=  W3  —  W3  =  WWW 

=  w2  —  w2  =  ww  - 


-  WWW, 
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As  in  usual  LES  practice  ( )  indicates  a  spatially  filtered 
variable  in  (8).  Under  the  assumption  that  the  filtering 
operator  commutes  with  ensemble  averaging,  e.g., 

(w3)  =  p)  =  (y) ,  (9) 

the  total  skewness  given  by  (6)  is  next  written  in  terms  of 
resolved  and  subgrid  contributions: 


(w3)  +  {()>) 

((&)  +  M)3/2 ' 


(10) 


Further  algebraic  manipulation  of  (10)  utilizing  (8)  leads 

=  sw(1  ~yl3/2  (11) 

where  S„  is  the  resolved-scale  skewness  (7)  and 


<t>  =  (<| >)/(w3^)  ,  (12a) 

V  =  <¥>/(»?)  ,  (12b) 

are  non-dimensional  SGS  moments.  (11)  is  useful  -  it 
defines  the  total  skewness  in  terms  of  LES  resolved  and 
subgrid-scale  variables.  As  might  be  expected,  the  sub¬ 
grid  contribution  to  the  total  skewness  involves  both  sec¬ 
ond  and  third  order  moments  of  vertical  velocity. 


¥ 


Figure  8:  Panel  a)  skewness  comparisons:  5123  simulation,  black  line;  5123  simulation  filtered  in  horizontal  planes 
to  642  resolution,  red  dotted  line;  and  643  simulation,  black  dashed  line.  Panel  b)  subgrid-scale  moments  computed 
from  5 1 23  simulation:  black  line;  y,  red  line;  and  the  SGS  skewness  correction  (1  -  \jf)3/2/(l  -  $)  [which  appears 
in  (11)],  blue  line. 


In  order  to  evaluate  the  importance  of  the  SGS  mo¬ 
ments  ($,y)  to  vertical  velocity  skewness  we  filtered  the 
5123  and  10243  simulation  results  to  produce  resolved 
and  SGS  variables  on  a  coarser  mesh.  This  step  is  justi¬ 
fied  since  the  LES  solutions,  as  shown  previously,  have 
effectively  converged  at  these  mesh  resolutions.  The  ver¬ 
tical  velocity  field  from  cases  E  and  F  are  filtered  in  hor¬ 
izontal  x  -y  planes  to  a  resolution  of  642  using  a  sharp 
spectral  filter  -  no  filtering  is  applied  in  the  z  direction. 
As  an  independent  check  on  the  processing  we  verified 
that  the  filtered  fields  satisfy  (1 1)  exactly. 

Vertical  profiles  of  skewness  and  SGS  moments  con¬ 
structed  from  the  filtered  5123  simulation  (referred  to 
as  case  Ef)  are  presented  in  figure  8.  Results  obtained 
from  filtering  the  10243  simulation  are  similar  but  dis¬ 
play  more  variability  due  to  less  averaging.  The  skew¬ 
ness  estimates  from  Ef  are  intriguing.  They  are  broadly 
similar  to  the  comparable  643  coarse  simulation  result, 
i.e.,  small  in  the  surface  layer  and  large  near  the  inversion 
but  exhibit  important  quantitative  differences.  In  the  sur¬ 
face  layer  the  skewness  from  case  Ef  is  always  positive 
except  very  near  the  ground,  in  contrast  to  simulation  B. 
This  is  in  agreement  with  our  physical  expectation.  Also 
the  skewness  from  Ef  matches  the  high  resolution  result 
in  mid-PBL.  The  SGS  moments  in  figure  8b  illustrate  the 
shortcomings  of  the  coarse  643  calculation.  In  the  sur¬ 
face  layer  the  triple_moment  $  is  very  large  contributing 
more  than  50%  to  (w3),  in  mid-PBL  ^  %  y,  and  near  the 


inversion  $  <  \y.  <j>  is  always  greater  than  zero.  Overall 
the  SGS  “correction”  to  skewness  given  by  the  ratio  on 
the  right  hand  side  of  (11)  is  >  4  in  the  surface  layer, 
~  1  in  mid-PBL,  and  falls  to  ~  0.8  near  the  inversion. 
We  mention  as  z  — ►  z,  the  strength  of  the  PBL  inversion 
might  also  alter  the  magnitude  of  the  SGS  moments  and 
their  relative  contributions  to  Sw.  As  noted  by  Hunt  et  al. 
(1988),  Smagorinsky  closures  are  Gaussian  models  and 
hence  assume  $  =  0.  As  a  consequence,  coarse  mesh 
LES  results  predict  erroneous  values  of  skewness  be¬ 
cause  of  their  SGS  closure  schemes.  In  general,  we  find 
coarse  mesh  LES  tends  to  overpredict  (vv3),  underpredict 
(w2),  and  thus  overpredict  S„  compared  to  fine  resolu¬ 
tion  simulations  as  shown  in  figure  9.  When  Smagorin¬ 
sky  closures  are  used  with  LES,  meshes  of  at  least  2563 
or  greater  are  needed  to  obtain  reliable  estimates  of  Sw. 
It  will  be  interesting  to  examine  vertical  velocity  skew¬ 
ness  from  LES  with  alternate  non-eddy  viscosity  closure 
schemes,  e.g.,  Wyngaard  (2004)  and  Hatlee  and  Wyn- 
gaard  (2007)  employ  rate  equations  for  the  SGS  fluxes 
and  variances. 

5.  SUMMARY 

A  highly  parallel  LES  code  that  utilizes  2-D  domain 
decomposition  and  retains  pseudospectral  differencing  in 
horizontal  planes  is  described.  The  code  exhibits  good 
scaling  over  a  wide  range  of  problem  sizes  and  is  capa- 
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Figure  9:  Comparison  of  third  and  second  order  vertical  velocity  moments  from  different  calculations.  5123  simula¬ 
tion,  black  line;  5123  simulation  filtered  in  horizontal  planes  to  642  resolution,  red  dotted  line;  and  643  simulation, 
black  dashed  line.  Panel  a)  normalized  (w3)/^3  and  panel  b)  normalized  (vv2) /w2. 


ble  of  using  as  many  16,384  processors  of  a  Cray  XT4. 
Flow  visualization  of  fine  mesh  5123  and  10243  simu¬ 
lations  of  a  convective  boundary  layer  shows  a  number 
of  intriguing  structural  features,  e.g.,  large  scale  plumes 
coupled  to  small  scale  vortical  dust  devils.  A  grid  sen¬ 
sitivity  study  of  a  canonical  daytime  convective  PBL 
shows  that  the  LES  solutions  converge  reasonably  well 
for  meshes  greater  than  or  equal  to  2563.  The  skewness 
of  vertical  velocity  S„  highlights  the  solution  sensitivity 
to  grid  resolution.  The  variations  of  S„  with  grid  resolu¬ 
tion  are  a  consequence  of  a  Smagorinsky  closure  which 
neglects  third-order  SGS  moments  of  vertical  velocity. 
Future  applications  of  this  parallel  code  include  high  res¬ 
olution  simulations  of  air-wave-water  interactions  under 
high  wind  conditions  and  PBL  couplings  with  surface- 
layer  vegetation. 


Acknowledgments:  We  thank  Chin-Hoh  Moeng,  Jeff 
Weil,  and  John  Wyngaard  for  their  insights  as  to  the 
variation  of  vertical  velocity  skewness.  P.  Sullivan  is 
partially  supported  by  the  Office  of  Naval  Research, 
Physical  Oceanography  and  Ship  Hydrodynamics  pro¬ 
grams.  E.  Patton  is  supported  by  the  Army  Research 
Office  (Grant:  MIPR5HNSFAR044)  and  from  the  Cen¬ 
ter  for  Multiscale  Modeling  of  Atmospheric  Processes 
(CMMAP)  at  Colorado  State  University,  NSF  grants 
ATM-0425247  and  contract  G-3045-9  to  NCAR.  Com¬ 
putational  time  for  the  10243  simulations  used  resources 


of  the  National  Energy  Research  Scientific  Computing 
Center,  which  is  supported  by  the  Office  of  Science  of 
the  U.S.  Department  of  Energy  under  Contract  No.  DE- 
AC02-05CH1 1231.  NCAR  is  sponsored  by  the  National 
Science  Foundation.  Any  opinions,  findings,  and  conclu¬ 
sions  or  recommendations  expressed  in  this  publication 
are  those  of  the  author(s)  and  do  not  necessarily  reflect 
the  views  of  the  National  Science  Foundation. 

REFERENCES 

Aoyama,  Y.  and  J.  Nakano,  1999:  RS/6000  SP:  Practical 
MPI  programming,  Technical  Report  IBM  Redbook 
SG24-5380-00,  International  Business  Machines. 

Beare,  R.  J.,  M.  K.  Macvean,  A.  A.  M.  Holtslag, 
J.  Cuxart,  I.  Esau,  J.-C.  Golaz,  M.  A.  Jimenez, 
M.  Khairoutdinov,  B.  Kosovic,  D.  Lewellen,  T.  S. 
Lund,  J.  K.  Lundquist,  A.  McCabe,  A.  F.  Moene, 
Y.  Noh,  S.  Raasch,  and  P.  P.  Sullivan,  2006:  An  in¬ 
tercomparison  of  large-eddy  simulations  of  the  stable 
boundary  layer,  Boundary-Layer  Meteorol. ,  118,  242- 
272. 

Bretherton,  C.  S.,  M.  K.  Macvean,  P.  Bechtold, 
A.  Chlond,  W.  R.  Cotton,  J.  Cuxart,  H.  Cuijpers, 
M.  Khairoutdinov,  B.  Kosovic,  D.  Lewellen,  C.-H. 
Moeng,  P.  Siebesma,  B.  Stevens,  D.  E.  Stevens, 
I.  Sykes,  and  M.  C.  Wyant,  1999:  An  intercompari¬ 
son  of  radiatively  driven  entrainment  and  turbulence 


in  a  smoke  cloud,  as  simulated  by  different  numerical 
models,  Quart.  J.  Roy.  Meteorol.  Soc .,  554,  391-423. 

Chandra,  R.,  L.  Dagum,  D.  Kohr,  D.  Maydan,  J.  Mc¬ 
Donald,  and  R.  Menon,  200 1 :  Parallel  Programming 
in  OpenMP ,  Academic  Press,  230  pp. 

Fedorovich,  E.,  F.  T.  M.  Nieuwstadt,  and  R.  Kaiser, 
2001:  Numerical  and  laboratory  study  of  a  horizon¬ 
tally  evolving  convective  boundary  layer.  Part  i:  Tran¬ 
sition  regimes  and  development  of  the  mixed  layer,  J. 
Atmos.  Sci .,  58,  70-86. 

Gibbs,  W.  R.,  2004:  A  parallel/recursive  algorithm,  J. 
Comp.  Phys.,  201,  573-585. 

Gropp,  W.,  S.  Huss-Lederman,  A.  Lumsdaine,  E.  Lusk, 
B.  Nitzberg,  W.  Saphir,  and  M.  Snir,  1998:  MPI-The 
Complete  Reference,  Volume  2,  The  MPI-2  Exten¬ 
sions,  The  MIT  Press,  344  pp. 

Hatlee,  S.  C.  and  J.  C.  Wyngaard,  2007:  Improved 
subfilter-scale  models  from  the  HATS  field  data,  J.  At - 
mos.  Sci. ,  64,  1694-1705. 

Hunt,  J.  C.  R.,  J.  C.  Kaimal,  and  J.  E.  Gaynor,  1988: 
Eddy  structure  in  the  convective  boundary  layer  -  new 
measurements  and  new  concepts,  Quart.  J.  Roy.  Mete - 
orol.  Soc.,  482,  827-858. 

Jonker,  H.  J.  J.,  P.  G.  Duynkerke,  and  J.  W.  M.  Cuijpers, 
1999:  Mesoscale  fluctuations  in  scalars  generated  by 
boundary  layer  convection,  J.  Atmos.  Sci.,  56,  801— 
808. 

Kanak,  K.  M.,  2005:  Numerical  simulation  of  dust  devil- 
scale  vortices,  Quart.  J.  Roy.  Meteorol.  Soc.,  131, 
1271-1292. 

Lele,  S.  K.,  1992:  Compact  finite  difference  schemes 
with  spectral-like  resolution,  J.  Comp.  Phys.,  103,  lb- 
42. 

Michalakes,  J.,  J.  Dudhia,  D.  Gill,  T.  Henderson, 
J.  Klemp,  W.  Skamarock,  and  W.  Wang,  2005:  The 
Weather  Research  and  Forecast  Model:  Software  ar¬ 
chitecture  and  performance,  in  Proceedings  of  the 
Eleventh  ECMWF  Workshop  on  the  Use  of  High 
Performance  Computing  in  Meteorology,  edited  by 
W.  Zwieflhofer  and  G.  Mozdzynski,  pp.  156-168, 
World  Scientific. 

Moeng,  C.-H.,  1984:  A  large-eddy  simulation  model  for 
the  study  of  planetary  boundary-layer  turbulence,  J. 
Atmos.  Sci.,  41,  2052-2062. 

Moeng,  C.-H.  and  R.  Rotunno,  1990:  Vertical  velocity 
skewness  in  the  buoyancy-driven  boundary  layer,  J. 
Atmos.  Sci.,  41,  1149-1162. 


Moeng,  C.-H.  and  P.  P.  Sullivan,  1994:  A  comparison  of 
shear  and  buoyancy  driven  planetary-boundary-layer 
flows,  J.  Atmos.  Sci.,  51,  999-1022. 

Moeng,  C.  H.  and  J.  C.  Wyngaard,  1988:  Spectral  analy¬ 
sis  of  large-eddy  simulations  of  the  convective  bound¬ 
ary  layer,  J.  Atmos.  Sci.,  45,  3573-3587. 

— ,  1989:  Evaluation  of  turbulent  transport  and  dissipa¬ 
tion  closures  in  second-order  modeling,  J.  Atmos.  Sci., 
46,  2311-2330. 

Nieuwstadt,  F.  T.  M.,  P.  J.  Mason,  C.  H.  Moeng, 
and  U.  Schumann,  1993:  Turbulent  Shear  Flows, 
Springer- Verlag,  43 1  pp. 

Patton,  E.  G.,  P.  P.  Sullivan,  and  C.-H.  Moeng,  2005: 
The  influence  of  idealized  heterogeneity  on  wet  and 
dry  planetary  boundary  layers  coupled  to  the  land  sur¬ 
face,  J.  Atmos.  Sci.,  62,  2078-2097. 

Pekurovsky,  D.,  P.  K.  Yeung,  D.  Donzis,  W.  Pfeiffer,  and 
G.  Chukkapallli,  2006:  Scalability  of  a  pseudospec- 
tral  DNS  turbulence  code  with  2D  domain  decompo¬ 
sition  on  Power4+/Federation  and  Blue  Gene  systems, 
in  ScicomP12  and  SP-XXL,  Boulder,CO. 

Raasch,  S.  and  M.  Schroter,  2001:  Palm  -  A  large-eddy 
simulation  model  performing  on  massively  parallel 
computers,  Meteorologische  Zeitschrift,  10,  363-372. 

Spalart,  P.  R.,  R.  D.  Moser,  and  M.  M.  Rogers,  1991: 
Spectral  methods  for  the  Navier-Stokes  equations  with 
one  infinite  and  two  periodic  directions,  J.  Comp. 
Phys.,  96,  297. 

Sullivan,  P.  P,  J.  B.  Edson,  T.  Hristov,  and  J.  C. 
McWilliams,  2008:  Large  eddy  simulations  and  obser¬ 
vations  of  atmospheric  marine  boundary  layers  above 
non-equilibrium  surface  waves,  J.  Atmos.  Sci.,  65, 
1225-1245. 

Sullivan,  P.  P.,  T.  W.  Horst,  D.  H.  Lenschow,  C.-H. 
Moeng,  and  J.  C.  Weil,  2003:  Structure  of  subfilter- 
scale  fluxes  in  the  atmospheric  surface  layer  with  ap¬ 
plication  to  large-eddy  simulation  modeling,  J.  Fluid 
Mech.,  482,  101-139. 

Sullivan,  P.  P,  J.  C.  McWilliams,  and  W.  K.  Melville, 
2007:  Surface  gravity  wave  effects  in  the  oceanic 
boundary  layer:  Large-eddy  simulation  with  vortex 
force  and  stochastic  breakers,  J.  Fluid  Mech.,  593, 
405-452. 

Sullivan,  P.  P,  J.  C.  McWilliams,  and  C.-H.  Moeng, 
1996:  A  grid  nesting  method  for  large-eddy  simu¬ 
lation  of  planetary  boundary  layer  flows,  Boundary- 
Layer  Meteorol.,  80,  167-202. 


Sullivan,  P.  P.,  C.-H.  Moeng,  B.  Stevens,  D.  H. 
Lenschow,  and  S.  D.  Mayor,  1998:  Structure  of  the 
entrainment  zone  capping  the  convective  atmospheric 
boundary  layer,  J.  Atmos.  Sci .,  55,  3042-3064. 

UCAR/JOSS,  2005:  Ad  hoc  committee  and  technical 
writing  group  for  a  petascale  collaboratory  for  the  geo¬ 
sciences,  Establishing  a  Petascale  Collaboratory  for 
the  Geosciences:  Scientific  Frontiers,  Technical  Re¬ 
port  http://www.joss.ucar.edu/joss.psg/meetings/ 
Meetings_2005/petascale/index.html,  UCAR/JOSS. 

Weil,  J.  C.,  1988:  Dispersion  in  the  convective  boundary 
layer,  in  Lectures  on  Air  Pollution  Modeling ,  edited  by 
A.  Venkatram  and  J.  Wyngaard,  pp.  167-227,  Ameri¬ 
can  Meteorological  Society. 

— ,  1990:  A  diagnosis  of  the  asymmetry  in  top-down  and 
bottom-up  diffusion  in  a  Lagrangian  stochastic  model, 
J.  Atmos.  Sci.,  47,  501-515. 

Weme,  J.  and  D.  C.  Fritts,  1999:  Stratified  shear  turbu¬ 
lence:  Evolution  and  statistics,  Geophys.  Res.  Lett., 
26,  439—442. 

Wyngaard,  J.  C.,  2004:  Toward  numerical  modeling  in 
the  Terra  Incognita,  J.  Atmos.  Sci.,  61,  1816-1826. 


